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Abstract 

This paper tackles algorithmic and theoretical aspects of dictionary learning from incomplete and random block- 
wise image measurements and the performance of the adaptive dictionary for sparse image recovery. This problem is 
related to blind compressed sensing in which the sparsifying dictionary or basis is viewed as an unknown variable and 
subject to estimation during sparse recovery. However, unlike existing guarantees for a successful blind compressed 
sensing, our results do not rely on additional structural constraints on the learned dictionary or the measured signal. 
In particular, we rely on the spatial diversity of compressive measurements to guarantee that the solution is unique 
with a high probability. Moreover, our distinguishing goal is to measure and reduce the estimation error with respect 
to the ideal dictionary that is based on the complete image. Using recent results from random matrix theory, we 
show that applying a slightly modified dictionary learning algorithm over compressive measurements results in 
accurate estimation of the ideal dictionary for large-scale images. Empirically, we experiment with both space- 
invariant and space-varying sensing matrices and demonstrate the critical role of spatial diversity in measurements. 
Simulation results confirm that the presented algorithm outperforms the typical non-adaptive sparse recovery based 
on offline-learned universal dictionaries. 


Index Terms 

Blind compressed sensing, dictionary learning, sensor diversity, adaptive image recovery. 


I. Introduction 

The theory of Compressed Sensing (CS) establishes that the combinatorial problem of recovering the sparsest 
vector from a limited number of linear measurements can be solved in a polynomial time given that the measurements 


satisfy certain isometry conditions [20]. CS can be directly applied for recovering signals that are naturally sparse 
in the standard basis. Meanwhile, CS has been extended to work with many other types of natural signals that 
can be represented by a sparse vector using a dictionary (l j. As an alternative to model-based dictionaries such 
as wavelets [j2j, Dictionary Learning (DL) Q is a data-driven algorithmic approach to build sparse representations 
for natural signals. 

Learning dictionaries over large-scale databases of training images is a time and memory intensive process 
which results in a universal dictionary that works for most types of natural images. Meanwhile, several variations 
of DL algorithms have been proposed for real-time applications to make the sparse representations more adaptiv^ j] 
in applications such as image denoising |4j], image inpainting j5] and most recently, compressed sensing |6|. 
Particularly, the last application has been termed Blind Compressed Sensing (BCS) to differentiate it from the normal 
CS where the dictionary is assumed to be known and fixed. Clearly, one would expect BCS to improve CS recovery 
when the optimal sparsity basis is unknown, which is the case in most real-world applications. Unfortunately, the 
existing work on BCS for imaging is lacking in two directions: (a) empirical evaluations and ( b ) mathematical 
justification for the general case. These issues arc discussed further below. 

• Empirical evaluations: In existing BCS works, such as [7|, empirical evaluations on images are mainly limited 
to the image inpainting problem which can be viewed as a CS problem where the compressive measurements 
are in the standard basis. In (8j, the generic CS problem is only tested on artificially generated sparse vectors. 


M. Aghagolzadeh and H. Radha are with the Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 
MI, 48823 USA e-mail: {aghagoll, radha@msu.edu}. 

1 We must emphasize the difference between adapting and learning dictionaries although the two terms are sometimes used interchangeably. 
In this paper, by dictionary learning we refer to its typical usage, i.e. the process of applying the DL algorithm to a large database of training 
images that produces a universal dictionary. Adapting a dictionary is the process of applying the DL algorithm to only one or a small number 
of possibly corrupted images to produce a dictionary that specifically performs well for those images. 
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Tested images and different running scenarios for the algorithms in [7] -fTOJ are rather limited and arguably 
not adequate in indicating the strengths and weaknesses of BCS in real-world imaging applications. Finally 
and most importantly, existing studies fail to compare the adaptive BCS recovery with the non-adaptive CS 
recovery based on universally learned dictionaries. 

• Mathematical justification: The original BCS effort |6j identifies the general unconstrained BCS problem 
as ill-posed. Subsequently, various structural constraints were proposed for the learned dictionary and were 
shown to ensure uniqueness at the cost of decreased flexibility. In a following effort |7J, a different strategy 
was used to ensure the uniqueness without enforcing structural constraints on the dictionary. However, the 
uniqueness was only justified for the class of sparse signals that admit the block-sparse model by exploiting 
recent results from the area of low-rank matrix completion GD- Finally, |[8j and [9j take empirical approaches 
toward unconstrained DL based on compressive measurements but do not provide any justification for the 
uniqueness, convergence or the accuracy of the proposed DL algorithm. 


The present work is different from existing efforts in BCS, both in terms of goals and methodology. In addition 
to the goal of having an objective function with a unique optimum, we would like the learned dictionary to be as 
close as possible to the ideal dictionary that is based on running the DL algorithm over the complete image. In 
other words, our goals include both convergence to a unique solution and high accuracy of the solution. Since no 
prior information is available about the structure of the ideal dictionary or the underlying signal, our method does 
not impose extra structural constraints on the learned diet ionaiyf^] or the sparse coefficients. 

Similar to most efforts in the area of compressive imaging, including the BCS framework, we employ a block 
compressed sensing or block-CS scheme for measurement and recovery of images 1121, [13]. Unlike dense-CS, where 
the image is recovered as a single vector using a dense sampling matrix, block-CS attempts to break down the high 
dimensional dense-CS problem into many small-sized CS problems for each non-overlapping block of the image. 
Some advantages of block-CS are: (a) block-CS results in a block-diagonal sampling matrix which significantly 
reduces the amount of required memory for storing large-scale sampling matrices, (6) decoding in extremely high 
dimension^] is computationally challenging in dense-CS and (c) sparse modeling or learning sparsifying dictionaries 
for high-dimensional global image characteristics is challenging and not well studied. Specifically, we study a block- 
CS scheme where each block is sampled using a distinct sampling matrix and show that it is superior to using a 
fixed block sampling matrix for BCS. One of our goals in this paper is to outperform non-adaptive sparse image 
recovery using universal dictionaries based on well-known DL algorithms such as online-DL [ 17J and K-SVD |4] 
while overcoming challenges such as overfitting. Rather than focusing on new DL algorithms for BCS, we focus 
on the relationship between the block-CS measurements and the BCS performance in an unconstrained setup. 

This paper is organized as follows. In Section [H] we review the dictionary learning problem under the settings of 
complete and compressive data. Before describing the details of our algorithm in Section |WJ we present our main 
contributions regarding the uniqueness conditions and the DL accuracy in the presence of partial data in Section 
III Simulation results are presented in Section [V] Finally, we present the conclusion and a discussion of future 


directions in Section [VTl 


A. Notation 

Throughout the paper, we use the following rules. Upper-case letters are used for matrices and lower-case letters 
are used for vectors and scalars. I n denotes the identity matrix of size n. We reserve the following notation: N 
is the total number of blocks in an image, n is the size of each block (e.g. an 8 x 8 block has size 64), m is the 
number of compressive measurements per block (usually m < n), p is the number of atoms in a dictionary, t is 
the iteration count, D E M nxp denotes a dictionary, x } E M n represents the vectorized image block (column-major) 
number j, aj E M p is the representation of Xj (i.e. Xj ~ Dotj), ( \> j E M m ' x " denotes the measurement matrix for 
block number j, yj E M m denotes the vector of compressive measurements (i.e. y } = &jXj). For simplicity, we 

2 The constraint of having bounded column norm or Frobenius-norm of the dictionary, which is used in virtually every dictionary learning 
algorithm, does not constrain the dictionary structure other than bounding it or its columns inside the unit sphere. Some examples of structural 
constraints used in |6j and JlO| respectively are block-diagonal dictionaries and sparse dictionaries. 

3 A typical consumer image has an order of 10 6 pixels which would make it impractical to be recovered as a single vector using existing 
sparse recovery methods with cubic or quadratic time complexities. 
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drop the block index subscripts in cty, Xj, 4> ; and yj when a single block is under consideration. Similarly, we omit 
the iteration superscript in a f 1 and when a single iteration is under study and it does not create confusion. 

The vector l p norm is defined as \\x\\ p = (£A \x,] p ) p . The matrix operators A<g>B and AQB respectively represent 
the Kronecker and the Hadamard (or element-wise) products. The operator vec(A) reshapes a matrix A to its column- 
major vectorized format. The matrix inner product is defined as {A, B) = Tr(.4 7 B) with Tr(,4) denoting the matrix 

trace, i.e. the sum of diagonal entries of A. The Frobenius-norm of A is defined as | .411 p = (^2ij | Ay | 2 j 2 . 
Finally, due to the frequent usage of Lasso regression [ |T4j in this paper, we use the following abstractions: 

C(x,D,X,a) = - II® — -Da||! + A||a||i 

£min(x, D, A) = min -||x - Da\\l + A||a||i 
a 2 

£“? n (x,T>,A) = argmin J||x-T>q:||1 + A||q!|| 1 

a 2 

In words, jC m i n (x,D,X) represents the model misfit and a* = £^ n (x, D, X) denotes the sparse coefficient vector. 
Also, note the obvious relationship: 

£(x, D, X, C r f n (x, D, A)) = £ mm (x, D, A) 


II. Problem Statement and Prior Art 


A. An oven’iew of the dictionary learning problem 

The Dictionary Learning (DL) problem can be compactly expressed as: 

D* 


where ip(D) represents the collective model mis 


= argmini/)(iA) 


N 


(PI) 


ip(D) = y^£ min (xj,D,X) 

3 =1 

As noted in J3] for a similar formulation of DL, ( |PTj ) represents a bi-level optimization problem : 

• The inner layer (also known as the lower level) problem consists of solving N Lasso problems to get 'f(I)). 

• The outer layer (or upper level) problem consists of finding a D that minimizes f>{D). 

Note that even for large-scale images (N 1) the lower level optimization can be handled efficiently by parallel 
programming because each block is processed independently. Flowever, in a batch-DL algorithm^] in contrast to the 
online-DL GZJ’ the upper level problem is centralized and combines the information collected from all blocks. In 
this paper, we use the batch-DL approach to stay consistent with the mathematical analysis. Flowever, in Section 
|TV] an efficient algorithm is described for solving the batch problem. Similar to |!T7J, the batch algorithm and its 
analysis can be extended to online-DL for the best efficiency. Flereafter, we omit the prefix ‘batch-’ in batch-DL 
for simplicity. 

The typical DL strategy that is used in most works Q, [5j, [17], [18], 1211—[231 is to iterate between the inner 
and outer optimization problems until convergence to a local minimum. Expressed formally, the iterative procedure 


is: 


with 


£)(t+l) _ 


arg min ip^{D) 


( 1 ) 


V> (t) (D) = 


(t) 

a) = 


N 

^ ^ £(xtj i B, A, oi 

3 =1 

D®, X) 


(*)> 
3 > 


£ arg (x 


3i 


4 There are alternative ways of expressing the DL problem that are related. For example, authors in |15) , propose to minimize the sum of 
h norms of coefficient vectors for a fixed (zero) representation error for each sample Xj. 

5 Batch processing refers to the processing of all N blocks at once while online processing refers to the one-by-one processing of blocks 
in a streaming fashion. 
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The algorithm starts from an initial dictionary D = D {i)) that can be selected to be, for example, the overcomplete 
discrete cosine frame Q. 

Perhaps surprisingly, the solution of ( |PTj ) is trivial without the additional constraint of having a bounded dictionary 
norm. To explain more, one can always reduce the model misfit C(x, D, A, a) by multiplying I) with a scalar s > 1 
and multiplying a with 1/s, thus reducing the i\ norm of a while keeping x — Da fixed, leading to \\D*\\r —y oo 
and ||a ||2 -» 0. There are two typical bounding methods to solve this issue that are reviewed in e.g. [24], [25]: 
a) bounding the i 2 norm of each dictionary column or b ) bounding | /4* || 77 . In this work, we use the second 
approach, i.e. the bound ||H||f < const., since it does not enforce a uniform distribution of column norms which 
makes the sparse representation more adaptive. As pointed out in [251, using a Frobenius-norm bound results in 
a weighted sparse representation problem (at the inner level) where some coefficients can have more priority over 
others in taking non-zero values. Additionally, having bounded column norms is a stronger constraint which makes 
the analysis more difficult when the dictionary is treated in its vectorized format (this becomes more clear in Section 
m- The typical method for bounding the dictionary is by projecting back the updated dictionary (at the end of each 
iteration) inside the constraint set. More details are provided in Section IV where we describe the DL algorithm. 


B. Dictionary learning from compressive measurements 

The problem of CS is to recover a sparse signal x £ M n , or a signal that can be approximated by a sparse 
vector, from a set of linear measurements y = <l>x £ M m . When m < n, the linear system is under-determined 
and the solution set is infinite. However, it is not difficult to show that for a sufficiently sparse x the solution to 
y = 4>./: is unique [261. Unfortunately, the problem of searching for the sparsest x subject to y = T.X is NP-hard and 
impractical to solve for high-dimensional x. Meanwhile, the CS theory indicates that this problem can be solved 
in a polynomial time, using sparsity promoting solvers such as Lasso 14 [, given that ( I> satisfies the Restricted 
Isometry Property (RIP) [201. 

CS also applies to a dense x when it has a sparse representation of the form x = Da (with a sparse a). 
Measurements can be expressed as y = Pa, where P = <1>D is called the projection matrix. It has been shown 
that most random designs of $ would yield RIP with high probabilities [27]. The compressive imaging problem 
can be expressed as: 

x = DX^(y,$D,\) (2) 


The well-known basis pursuit signal recovery corresponds to the following asymptotic solution [311: 


£ = a !™ + D 'Cfn(v A) 


(3) 


Hereafter, we focus on the block-CS framework where each Xj represents an image block and yj = ‘bjX j 
represents the vector of compressive measurements for that block. The iterative DL procedure based on block-CS 
measurements can be written as: 

Z)( i+1 ) = argmin^^(Z)) (4) 

with 


N 

^\ d ) = a, of) 

3 =1 

= Cfn(W.^O (,) .A) 

To distinguish ([4]) from the normal DL in ([!]) and other BCS formulations | 6 ]-| 8 ], we refer to this problem as 
Dictionary Learning from linear Measurements or simply DL-M. 

We could also arrange the block-wise measurements into a single system of linear equations: 


y 1 


a\ 

V2 

= (In < 8 > D) 

«2 

. VN . 


. a N . 


( 5 ) 
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where 


$ = 


$1 


$ 2 


< 3 ? tv 


( 6 ) 


represents the block-diagonal measurement matrix. Our results can be easily extended to dense-CS, i.e. CS with 
a dense T. Although, the utility of dense-CS would not allow sequential processing of blocks as required by an 
online-DL framework, a batch-DL framework is compatible with dense-CS. 

III. Mathematical Analysis 

The benefits of using a distinct <I> ; for each block can be understood intuitively Q. However, it is important to 
study the asymptotic behavior of DL when N —>■ oo, as well as the non-asymptotic bounds for a finite N. 

In the first part of this section, we prove that the iterative DL-M algorithm returns a unique solution with a 
probability that approaches one for large N. Specifically, we show that the outer problem, known as the ‘dictionary 
update’ stage, 

£)(*+!) = ar g rr q TI (£)) 

is unique for fixed a'p ’s and also every inner problem 

•ii-.af = £"?„(»> 

is unique for a fixed D^\ Therefore, starting from an initial point D = /T 0) , the sequence of DL-M iterations 
forms a unique path. 

To specify the accuracy of the DL-M algorithm, we measure the expectation 




arg nrin ib 
D 


®{D) — arg min'0^ (D) || 


(7) 


starting from the same (fixed) ’s. Meanwhile, the inner problem is precisely a noisy CS problem and its accuracy 

has been thoroughly studied [1], Specifically, when m = 0(kjlogp) where kj denotes the sparsity of a^\ the 
inner CS problem for block j can be solved exactly. The presented error analysis for a finite N is limited to a 
single iteration of dictionary update. Nevertheless, the asymptotic conclusion as we present is that the DL-M and 
DL solutions converge as N approaches infinity. 

Based on the above remarks, our analysis is focused on a single iteration of DL-M. Therefore, for simplicity, 
we drop the iteration superscript in the rest of this section unless required. 

First, we write ^(D) in the standard quadratic format: 


yj - <l> j Da j 


N 

3 =1 


a? l 


Ad) = l£ 

3 =1 

= \j 2 yJ y 3 + x J 2 

3 =1 3 =1 


N 


l a ?l|l + 


3^ a j 


3 =1 


3 =1 


n] D T ^J^ j Da j 


= Tr(aq" D 1 & J QjDaj 


*3 - ~3 *3^^3) 

Tr(D T $J<f> j Da j aJ) 

(D , &J<f>jDajaJ) 

vec (D) t vec(&JQjDajaJ) 

vec (D) T (ajaJ <8> vec (D) 


We can further write: 
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and 


yJ^jDotj = TV (yf^jDatj) 
= ^(ajyJ^jD) 


3 

= vec(§JyjaJ) T vec(D) 


Letting d = vec (D) and g(d ) = 'ijj(D), the standard quadratic form of 'ip(D) can be written as: 


g(d) = -d T Qd + f T d + c 


(8) 


with 


N 


Q = ^ a 3 a J 

3 =1 

N 


f = ~ 


i=i 

N 


a a 


c = - 


N N 

oE^J + A E 


Oj 1 


9 / y j a J 1 ' ' / , i j 

i=t i=i 

Next, we shall specify the stochastic construction of block-CS measurements that we term the BIG measurement 
scheme. 

Definition 1. ( BIG measurement ) In a Block-based Independent Gaussian or BIG measurement scheme, each 
entry (k,l) of each block measurement matrix [&j] kl is independently drawn from a zero-mean random Gaussian 
distribution with variance 1/m. 

The 1/m variance guarantees that = I n . Note that although our analysis focuses on Gaussian 

measurements, it is straightforward to extend it to the larger class of sub-Gaussian measurements which includes 


the Rademacher and the general class of (centered) bounded random variables [27 ]. 


A. Uniqueness 

Before presenting the uniqueness results, we review the matrix extension of the Chernoff inequality [28] that is 
summarized in the following lemma. 


Lemma 2. (Matrix Chernoff, Theorem 5.1.1 in [28]). Consider a finite sequence {X/.} c R nXn of independent, 
random and positive semidefinite Hermitian matrices that satisfy A max (X/ c ) < R. Define the random matrix 
Y = fP k X k . Compute the expectation parameters: /i Tnax = A max (EF) and /i rmn = A m i n (EY) Then, for 9 > 0, 


and 


Furthermore, 


ford > 0 and 


e e - 1 1 

EA max (Y) < —-—/r max + -.Rlogn 
1 — e~ e 1 

EA min (Y) 7 pmin — i? log 


E{A max (T ) > (1 + 6) //max} < n ^ ^ ^^(l+< 5 ) ^ 


Pmax j R 


E{A min (Y) < (1 - 5)g m in} < n 


,-<5 


(1 - 5)^~ s ) 


flmin /R 


(9) 

( 10 ) 

( 11 ) 

( 12 ) 
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This lemma will be used to show that, with a high probability, the Hessian matrix Q of g(d), which is a sum of 
random independent matrices, is full rank and invertible. 

In the following theorem, let go denote the lower bound of the smallest eigenvalue of the covariance matrix 
E {ctjaj}. Note that the covariance matrix must be full rank, or equivalently go > 0, otherwise even the original 
dictionary learning problem (based on the complete data) would not result in a unique solution. On top of that, the 
magnitude of go has a direct impact on the condition number of the Hessian matrix and the numerical stability of 
DL-M, as well as DL. 

Theorem 3. In a BIG measurement scheme with N ^ ” , g(d ) has a unique minimum with a high 

probability. 


Proof: Taking the derivative of g{d) with respect to d and letting it equal to zero results in the linear equation 
Qd = /. Thus, to prove that the solution is unique, we must show that the Hessian matrix is invertible (with a high 
probability) for large N. Equivalently, we must show that the probability IP{A rm „((,)) > 0} is close to one when 
N ^$> nl °° n - Since Q is a sum of independent matrices, we may use the matrix Chernoff inequality from Lemma 
[2] Hence, we must compute the following quantities: 

R = sup A max (ayaJ <g> <f>J+,) 


and 


Zhilin — A m j n (E 'y ^ QtjOt.j (8> 


— Amin ^ OLjOLj 


Using properties of the Kronecker product, 

Ama ® +y +j) — A max {o'jfy.j )A max (T> ■ Ty) 


< 


\a 


■ j \\2 


(1+5) 


where the inequality holds with probability 1 — 2e m ++ 4 for a random Gaussian measurement matrix <I> 


271. Suppose the energy of every a :) is bounded by some constant v = 0(n) given that Xj has bounded energy. 
Therefore, with probability 1 — 2e~ m ++ 4 ~ 4 ++, R < (1 + S)v. Roughly speaking, assuming that pixel intensities 
are in the range [0,1], R « n. 

Again, using properties of the Kronecker product, 


N 

/ 7 n i i 11 — A nl m (E ajaj ) 

3= 1 

~ A r A min (E{Q j Q:J}) > Ng 0 


Based on Lemma [2j specifically using (fid]) with <5 = 1, 


E{A m in(Q) < 0} < ne~^ /R < ne~ Nflo/R (13) 

Requiring ne~ NfJ, °/ R <S 1, and that R « n, is equivalent to N n ^ n . ■ 

We have established that the upper level problem results in a unique solution with a high probability!^] To complete 
this subsection, we use the following result from [29] that implies the lower level problem £ ^, (y 7 -, T> ? i9, A) is 
unique. 


Lemma 4. [29] If entries of D are drawn from a continuous probability distribution on M nxp , then for any 


Xj 


and A > 0 the lasso solution (xj,D, A) is unique with probability one. 


Since each <I>y is drawn a continuous probability distribution in the BIG measurement scheme, <hy I) is also 
distributed continuously in the space M. mxp and this establishes that £™f (yj,$jD, A) is unique. 


6 Clearly, projecting the resultant dictionary onto the space of matrices with a constant Frobenius-norm would preserve the uniqueness 
since there is only a single point on the sphere of constant-norm matrices that is closest to the current dictionary. 








B. Accuracy 

In this subsection, we shall compute stochastic upper bounds for the £2 distance between the DL-M solution 
and the DL solution for a single iteration of dictionary update and for fixed afs. The extension of these results to 
multiple iterations is left as a future work. As before, in the following results, we omit the iteration superscript for 
simplicity. 

Let us define the corresponding standard quadratic form g(d ) = 'tp(D) for the upper-level DL problem: 

g(d) = ^d T Qd +f'd + c (14) 

where 

N 

Q — ^ ^ Oij CX j (X) In 

3 =1 

N 

f = 

3 =1 

1 N N 

c = + X^Wajlh 

3 =1 j=l 

For BIG measurements, E{4>y<I>j} = I n . Therefore, it is easy to verify that Q = E{<3}, f = E{/} and c = E{c} 
where it is assumed that the data and coefficients are fixed. This leads to the following lemma that points out the 
unbiasedness of the compressive objective function. 

Lemma 5. g(d) is an unbiased estimator of g{d) for BIG measurements: 


Proof: 


E{g(d)} = g(d) 

E{g(d)} = E { l -d T Qd + f T d + c} 

= ^d T E{Q}d + E{f} T d + E{c} 
= ^d T Qd + f 1 d + c 

= g( d ) 


The following crucial lemma implies that the two objective functions g{d) and g{d) get arbitrarily close as the 
number of blocks (N) approaches infinity. 

Lemma 6. ( (30), based on Theorem III.l) In a BIG scheme, 

Ef =1 IlSjfo - D«i )III - gLi \\xj - Daj III 

T,f=l \\ X 3- Da 3\\l 

< 2e~ Cern21 




where 7 E [LA"] can be computed as: 


7 = 


2 - 7=1 


\x 


3 Daj 


ma Xj || Xj — Doij 


We have simplified and customized Theorem III.l of [30] for our problem here. Specifically, the bounds in 301 
are tighter but more difficult to interpret. The above lemma states that, with a constant (high) probability that 
depends on m, the deviation e is inversely proportional to N when the signal energy is evenly distributed among 
blocks. 
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Lemma [6] can be further customized by noticing that 



N N 

ll s i ~ Da j\\2 = 2 [g( d ) - 9(d)] 

3=1 3=1 


and that g(d) > \ 

Xj — Daj || 2, leading to 



F{\g(d) - g(d)\ > eg(d)} <2e~ c ^ 

(15) 

Hereafter, to simplify the notation, let 

d = arg min g(d) 
d 

(16) 

and 

d* = arg min g(d) 
d 

(17) 


The following theorem provides upper bounds for the expectation E{||<2 — d* |||}. Suppose that, for a fixed N, 
there exists a positive constant / /1 such that A m i n (Q) > //1 . Clearly, this is a reasonable assumption for large N 
according to Theorem [5] More specifically, according to Lemma [2j E{A m j n (Q)} grows linearly with N (because 
/i rn jn grows linearly with N). 

Theorem 7. d and d* converge as N approaches infinity. Specifically, 


2 e 


E{|| d-d*\\i}<—9(d*) 

Mi 

Proof: We start by writing the Taylor expansion of the quadratic function g(d) at d* 


g(d*)=g(d) + 


dg(d) 

dd 


d=d 


(d* - d) + -{d* - dfQ(d* - d) 


Since d = arginine g{d). 


and we can write 


dg(d) 

dd 


= 0 


d=d 


g(d*)-g(d) = -( d*-d) T Q(d*-d ) 


> 


^min(Q) | 


d* - d\\l 


> yll<r-d||! 


Taking the expected value of both sides 


E{\\d* - dg} < -K{g(d*) - g(d)} 

Mi 


(18) 


From Lemma [6] we know that with a probability of at least 1 — 2e CVm ~ 7 the following inequalities hold 

(1 - e)g(d) < g(d) < (1 + e)g(d) (19) 

(l-e)g(d*) <g(d*) < (1 + e)g(d*) (20) 

On the other hand, we have the following inequalities at the optimum points of g(d) and g(d) 

g(d*) < g(d) (21) 


and 


9(d) < g(d*) 


( 22 ) 
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It is easy to check that, by combining (fT9|), (|20|), ([21]) and ([22]), we can arrive at the following inequality: 


(1 - e)g(d*) < g(d) < (1 + e)g(d*) 

or equivalently, 

- eg{d*) < g(d*) - g(d) < eg(d*) 


Taking the expected value, we get 


From Lemma [5] we know that 


€^(d*)<E{^(d*)- 5 (d)}<e^(d*) 

E{g(d*)}=g(d*)=E{g(d*)} 


(23) 

(24) 


Therefore, 


Use (24) and (25 l to arrive at 


0 < E{ 5 (d*) - g(d)} = E{g(d*) - g(d)} 
E{g(d*) - g(d)} <eg(d*) 


(25) 

(26) 


which, along with ( fT8| ), completes the proof. ■ 

Note that e is inversely proportional to N and g(d*) grows linearly with N. Furthermore, using (12) for a 
constant probability, it can be shown that increases linearly with N, making the ratio ^-g(d*) arbitrarily small 
for N —> oo. 

Finally, we show that after projection onto ||D||_p = c, where c is a positive constant, the upper bound of the 
estimation error would still approach zero for large N. By noticing that | /.) | /? = 11 o?|1 2 , this projection can be 
written as 


d 


cd 


d* 


cd* 

I d* II 


It is easy to show that 


cd 


cd* 


I d* 


< c max • 


1 


1 


Id* 


| d-d* 


Using d = Q and d* = Q lower bounds for 

I2 


2 11“ 112 J 

2 and ||d*|| 2 can be computed as 


| d || 2 > 


^max (QY 


I d* 


I2 > 


Therefore 


max ' 


Id* 


< max 


^max (Q) 

^max (Q) ^max ( 0 ) 


Using Lemma [2] and other well-established concentration inequalities, one can find stochastic upper bounds for 
quantities above. However, given that A max (<3) 5 A m;i . x (0), ||/1 |2 and ||/||2 scale linearly with N, we can safely 
conclude that the estimation error remains bounded by an arbitrarily small number as N approaches infinity. 
Moreover, intuitively speaking, the I 2 norm of d and d* tends to increase before projection (which is the reason 
for bounding the dictionary in the first place) and the ratios c/1| ci 11 2 and c/11 d* 11 2 are likely to be smaller than one, 
resulting in a decrease in the estimation error. 


IV. The Main Algorithm 

The employed algorithm for DL-M is based on ([!]). However, as explained below, we introduce several modifi¬ 
cations to decrease the computational complexity and speed up the convergence. Similar to other DL algorithms, 
such as ||4), the proposed algorithm consists of two stages that are called the sparse coding stage and the dictionary 
update stage. We describe them individually in the following subsections. 
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A. The sparse coding stage 


As we mentioned in Section II-B the basis pursuit (exact) CS recovery is the limit of the Lasso solution 


^mtn( 2/5 D■ A) as A approaches zero [31 [. However, a truly sparse and exact representation is usually not possible 


using any dictionary with a finite size. As a result, in sparse recovery of natural images, A is usually selected to 
be a small number rather than zero even in noiseless scenarios (16], [17]. Our algorithm starts from a coarse and 
overly sparse representation, by selecting the initial A to be large, and gradually reduces A until the desired balance 
between the total error sum of squares and the sparsity is achieved. The idea behind this modification is that the 
initial dictionary is suboptimal and not capable of giving an exact sparse representation. However, as the iterations 
pass, the dictionary becomes closer to the optimal dictionary and A must be decreased to get a sparse representation 
that closely adheres to the measurements. 

Initializing the counter at t = 0 and starting from an initial dictionary D = T) t{] \ the sparse coding stage consists 
of performing the following optimization: 


Vj : cif* = argmin \ \\yj — 

J Ot Z 


a 


A (i| || 


ah 


We deploy an exponential decay for \P>: 

A^ = max{Aoe T * ' log ( A *) , A*} 


(27) 


(28) 


According to (281, A is decreased from A = Ao to A = A* in £ = T* iterations and stays fixed at A = A* 
henceforth. For an exact recovery, A* = 0 is seemingly a plausible choice. However, for the reasons mentioned 
earlier, we set A* to a very small but non-zero value that is specified in the simulations section. 


B. The dictionary update stage 

The quadratic optimization problem of ( fl6| ) can be computationally inefficient to solve. More specifically, solving 
© in one step requires computing the inverse of Q (if it exists) which has roughly a time complexity of 0(n 3 p 3 ) 
and is a memory intensive operation. The strategy that we employ in this paper, similar to what was proposed in 
[3j, [18], 1241, is to perform a gradient descent step: 

£>( t+1 ) = Z>W - p^Vd^Xd) (29) 

where Vd^XD) can be computed efficiently: 


N 


Vd^Xd) = -JZ 


(30) 


3 = 1 


The step size /j0) > 0 can be iteratively decreased with t [18] or it can be optimized in a steepest descent fashion 
that is described below0 


The optimal value of the step size pX' 1 can be computed using a simple line search [191. However, for a quadratic 
objective function, we can derive p'X in a closed form as shown below. Let G^i = G-'Hl)) = V o'X^HO). Then, 


pf* = 


N 


argmin^ | yj — ||| 


3 = 1 

N 


arg min E || yj-QjiDV+pG^a 


(t) 11 2 
3 Ha 


3 = 1 


y 


Writing the optimality conditions for the objective function above, we can arrive at the following solution for 

(*). 


pX' 1 = 


l|o (t) ll; 


Eli ll#jG<‘)aS‘ ) lll 


n J- 

7 If Q is well-conditioned, a single step of steepest descent can give a close approximation of the the solution of dl6 


( 31 ) 
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Since the initial dictionary consists of p unit-norm columns, ||.D( 0 )||^ = yjp. As discussed in Section |nj DL 
results in an unbounded dictionary if no constraint is put on the dictionary norm or the norm of its columns. Here, 
we employ a Frobenius bound on D because it lets different dictionary columns have distinct norms. Specifically, 
after each update we ensure the constraint ||D^ +1 )||p’ = ^fp. This is done by multiplying 

£)( t+1 ) with ps ||Z)( 4 + 1 )||” 1 . Algorithm 0 gives a summary of these steps. 

Algorithm 1 Dictionary learning from compressive samples. 

Require: D, y 3 and ( I>, for every j, Ao, A*, T*, T max 
Initialization t <— 1, <— D 

while t < T max do 

_ Sparse Coding_ 

Compute (y[' } for every j: 

J y 

ocf* = argmin-|| y 3 - ®jD^a\\l + A (<) [|a||i 

with 

A^ = maxjAoe ' log C a* ) ^ A^} 


Compute £>( t+1 ) 


and 


_ Dictionary Update 

D (t) using: 




N T 

3= 1 



El 


W^jG^afWl 


Normalize the dictionary: 


£>(*+!) 


p\ u>h+ 1 ) 


t i — t ~\~ 1 

end while 
return D^ Tmax ^ 


We conclude this section by demonstrating the performance of the proposed DL-M algorithm using space-varying 
block measurements through an empirical test and compare it with the scenario if space-invariant (or fixed) block 
measurements were employed. 

C. Testing the algorithm: a real-world example 

In this subsection, we test the performance of the described DL-M algorithm for block-CS using a) space- 
varying and b ) fixed sampling matrice^] In this experiment, non-overlapping 8x8 patches from the Barbara’s 
image (shown in Figure |4]) are sampled at 50% sampling ratio, i.e. n = 64 and m = 32. Pixel intensities are 
normalized to be in the range [0,1] and blocks arc vectorized and centered. We have used a fixed A = 0.01 for this 
experiment. Specifically, we are interested in the real-time reconstruction Peak-Signal-to-Noise-Ratio or PSNR. For 
the varying case, sampling matrices are generated randomly for each block according to independent and identically 
distributed Gaussian distribution with variance 1/m (equivalent to the BIG measurement scheme described before). 
Furthermore, we orthogonalize and normalize rows of each generated sampling matrix so that measurements are 

8 Fixed sampling matrices have been used in dictionary learning from block compressive measurements, such as in 1331 
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weighted equally in the learning problem. In the fixed sampling matrix case, a single random matrix is generated 
and employed for all blocks. The initial dictionary for this experiment is a redundant dictionary of size p = 256 
that has been trained using the method of K-SVD and distributed in [4j. This dictionary, which is shown in Figure 
[2] (top), serves as the benchmark among redundant dictionaries for natural images. 

Figure [T] shows the real-time PSNR graphs, i.e. PSNR measured after each iteration of the DL-M algorithm, 
associated with both cases of space-varying and fixed sampling matrices as well as the non-adaptive PSNR. Several 
crucial observations can be made about Figure [T] Although a slight improvement is achieved after the first few 
iterations of the DL-M algorithm based on the fixed sampling matrix, the PSNR is decreased subsequently. The 
decline in the PSNR performance is expected because the Hessian matrix Q from ([8]), would be low-rank when 
= $2 = ■ ■ ■ = t&jv, making DL-M an ill-posed problem. 



Fig. 1. Graphs of the real-time PSNR for cases of adaptive recovery using space-varying and fixed sampling matrices versus the non-adaptive 
recovery using a universal dictionary. The horizontal axis shows the iteration count t. 

Meanwhile, the PSNR graph for the space-varying block-CS shows significant enhancement with respect to the 
initial PSNR just after a few iterations. It is helpful to visually inspect the dictionaries before and after adaptation. 
These dictionaries are shown in Figure [2j The resultant adaptive dictionary is shown at the bottom of Figure [2] A 
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careful inspection reveals that some of the texture from the input image have been captured within the adaptive 
dictionary. These texture patterns are the parts of the image that could not be compactly represented using the initial 
dictionary. The misfit of the initial dictionary results in artifacts in the non-adaptive recovery around the textured 
areas as can be seen in Figure 3a The recovery after adaptation is shown in Figure [3b] for comparison. Note the 
improvements in the textured areas in the adaptively recovered image compared to the initial recovery using the 
universal dictionary. 
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Fig. 2. 


Dictionaries. Top: the initial universal dictionary, bottom: the adapted dictionary based on 50% sampling from Barbara’s image. 



(a) Using the universal dictionary (b) Using the adaptive dictionary 

Fig. 3. Recoveries from 50% sampling using the fixed universal and adaptive dictionaries. 
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Universal dictionary 

Adaptive dictionary 

5% 

10% 

20% 

25% 

30% 

40% 

50% 

5% 

10% 

20% 

25% 

30% 

40% 

50% 

Barbara 

22.14 

23.01 

25.19 

26.52 

27.43 

29.19 

31.09 

22.20 

23.22 

25.99 

27.88 

29.39 

32.05 

34.85 

boat 

23.47 

24.84 

28.47 

30.80 

32.41 

35.24 

38.18 

23.51 

25.03 

29.11 

31.45 

33.03 

35.95 

39.23 

bridge 

21.98 

23.15 

26.05 

27.88 

29.16 

31.67 

34.79 

22.03 

23.34 

26.50 

28.24 

29.48 

31.97 

35.13 

couple 

23.59 

24.94 

28.48 

30.80 

32.41 

35.43 

38.77 

23.65 

25.33 

29.29 

31.45 

32.95 

35.87 

39.29 

fingerprint 1 

17.77 

19.31 

23.66 

26.88 

29.27 

33.87 

39.02 

17.95 

20.71 

26.18 

29.29 

31.65 

36.03 

41.29 

fingerprint 2 

17.58 

19.13 

22.89 

25.03 

26.33 

28.41 

30.46 

17.84 

21.35 

27.59 

30.74 

32.95 

35.81 

38.85 

Flintstones 

17.12 

18.59 

22.72 

25.52 

27.41 

30.61 

33.45 

17.13 

18.73 

23.48 

26.29 

28.17 

31.22 

33.97 

grass 

12.51 

13.29 

15.36 

16.85 

17.99 

20.32 

23.01 

12.55 

13.52 

15.94 

17.58 

18.88 

21.43 

24.62 

hill 

25.83 

27.21 

30.85 

33.17 

34.70 

37.44 

40.28 

25.89 

27.76 

31.74 

33.83 

35.26 

37.91 

41.05 

house 

24.02 

25.39 

29.22 

31.73 

33.42 

36.42 

39.44 

24.09 

25.89 

32.27 

35.96 

38.27 

41.83 

45.51 

Lena 

25.12 

26.75 

31.44 

34.40 

36.33 

39.69 

43.19 

25.20 

27.56 

32.39 

34.98 

36.76 

39.93 

43.35 

man 

24.40 

25.79 

29.43 

31.76 

33.32 

36.19 

39.44 

24.46 

26.11 

30.14 

32.28 

33.76 

36.52 

39.70 

matches 

21.16 

22.90 

27.83 

30.82 

32.62 

35.73 

39.35 

21.21 

23.93 

29.18 

31.70 

33.50 

36.67 

40.47 

shuttle 

26.26 

28.06 

33.74 

37.83 

40.58 

45.22 

49.71 

26.39 

29.33 

35.70 

39.38 

41.91 

46.20 

50.53 


TABLE I 

Recovery PSNRs for SET1. PSNRs are in dB. Percentage values are sampling ratios. 



Universal dictionary 

Adaptive dictionary 

5% 

10% 

20% 

25% 

30% 

40% 

50% 

5% 

10% 

20% 

25% 

30% 

40% 

50% 

Barbara 

21.17 

21.94 

23.77 

24.88 

25.65 

27.09 

28.55 

21.23 

22.06 

24.33 

25.75 

26.84 

28.80 

30.83 

boat 

22.34 

23.45 

26.05 

27.54 

28.52 

30.15 

31.61 

22.37 

23.52 

26.34 

27.81 

28.76 

30.49 

32.32 

bridge 

20.40 

21.20 

22.91 

23.86 

24.48 

25.54 

26.55 

20.44 

21.22 

23.01 

23.93 

24.52 

25.68 

27.02 

couple 

22.39 

23.44 

25.97 

27.46 

28.45 

30.12 

31.76 

22.44 

23.62 

26.40 

27.78 

28.69 

30.34 

32.15 

fingerprint 1 

17.09 

18.46 

21.93 

24.22 

25.74 

28.40 

31.13 

17.21 

19.43 

23.51 

25.53 

26.87 

29.13 

31.62 

fingerprint 2 

17.09 

18.51 

21.83 

23.64 

24.75 

26.48 

28.13 

17.30 

20.38 

25.61 

28.15 

29.95 

32.33 

34.97 

Flintstones 

16.53 

17.83 

21.14 

23.17 

24.43 

26.40 

27.96 

16.55 

17.89 

21.61 

23.58 

24.79 

26.70 

28.38 

grass 

11.75 

12.38 

13.96 

15.01 

15.77 

17.15 

18.49 

11.78 

12.50 

14.28 

15.39 

16.21 

17.68 

19.33 

hill 

24.19 

25.20 

27.47 

28.74 

29.56 

30.94 

32.30 

24.25 

25.41 

27.83 

28.96 

29.69 

31.09 

32.65 

house 

23.14 

24.34 

27.44 

29.34 

30.57 

32.62 

34.60 

23.21 

24.68 

29.33 

31.61 

32.96 

34.89 

36.72 

Lena 

24.06 

25.41 

28.77 

30.66 

31.80 

33.69 

35.45 

24.13 

25.96 

29.36 

31.01 

32.04 

33.83 

35.60 

man 

23.10 

24.17 

26.65 

28.05 

28.93 

30.46 

31.97 

23.15 

24.33 

26.96 

28.26 

29.08 

30.60 

32.24 

matches 

20.22 

21.65 

25.10 

26.83 

27.78 

29.32 

30.88 

20.26 

22.31 

25.82 

27.23 

28.10 

29.56 

31.14 

shuttle 

25.47 

27.14 

31.98 

35.11 

37.16 

40.49 

43.69 

25.57 

28.30 

33.52 

36.25 

38.10 

41.14 

44.16 


TABLE II 

Recovery PSNRs for SET2. PSNRs are in dB. Percentage values are sampling ratios. 


V. Simulations and Results 

A. Simulation settings 

The set of 14 test images, which are down-scaled due to the limited space, are shown in Figure [4] Each test 
image has a resolution of 512 x 512. In all simulations, we have used a fixed block size of 8 x 8. Input parameters 
to Algorithm [I] are Ao = 0.05, A* = 0.001 and T max = 20. However, we selected T* in the range [1,10] and 
proportional to m, the number of compressive measurements per block. 

This section includes five different simulation settings that are described in the following. 

SET1: In this setting, we denoise]^] the test images prior to measurement by performing a Lasso regression for 
each (non-overlapping) block with A = 0.05 with respect to an “ideal” dictionary that is only known to the tester. 
This pre-processing guarantees that, although unknown to the learner, there exists an exact sparse representation for 
every test image. The ideal dictionary consists of p = 256 atoms (a redundancy of factor 4) that is learned based 
on the set of all overlapping blocks of the test image using the dictionary learning algorithm in p7| . The initial 
dictionary is computed by cross-validation. Specifically, 10 6 training patches were randomly selected from other 
images in Figure [4] for training the initial dictionary with p = 256; this results in a distinct initial dictionary for 
each test image. For learning all ideal and initial dictionaries, we have set the maximum number of DL iterations 

9 Here, denoising refers to the process of removing small representation coefficients, making the exact image representation truly sparse. 
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Universal dictionary 

Adaptive dictionary i 

5% 

10% 

20% 

25% 

30% 

40% 

50% 

5% 

10% 

20% 

25% 

30% 

40% 

50% 

Barbara 

21.08 

21.76 

23.25 

24.06 

24.64 

25.76 

27.16 

21.17 

21.95 

23.70 

24.82 

25.78 

27.60 

29.76 

boat 

22.24 

23.29 

25.83 

27.33 

28.31 

30.04 

31.69 

22.30 

23.46 

26.20 

27.71 

28.67 

30.47 

32.32 

bridge 

20.36 

21.17 

22.97 

23.96 

24.62 

25.80 

27.06 

20.41 

21.27 

23.10 

24.02 

24.62 

25.82 

27.24 

couple 

22.34 

23.40 

25.96 

27.52 

28.56 

30.47 

32.44 

22.41 

23.67 

26.39 

27.79 

28.72 

30.51 

32.46 

fingerprint 1 

16.98 

18.22 

21.40 

23.49 

24.95 

27.62 

30.53 

17.14 

19.30 

23.26 

25.30 

26.68 

28.97 

31.58 

fingerprint 2 

17.01 

18.50 

22.22 

24.30 

25.57 

27.80 

30.30 

17.30 

20.55 

25.95 

28.70 

30.75 

33.50 

36.67 

Flintstones 

16.43 

17.62 

20.71 

22.60 

23.88 

26.05 

28.00 

16.47 

17.78 

21.33 

23.29 

24.58 

26.62 

28.51 

grass 

11.71 

12.27 

13.65 

14.63 

15.37 

16.84 

18.50 

11.74 

12.37 

14.00 

15.10 

15.94 

17.47 

19.24 

hill 

24.12 

25.15 

27.49 

28.80 

29.65 

31.16 

32.78 

24.20 

25.42 

27.84 

29.04 

29.80 

31.27 

32.92 

house 

23.07 

24.19 

27.09 

28.96 

30.28 

32.62 

34.97 

23.13 

24.51 

29.08 

31.43 

32.80 

34.83 

36.75 

Lena 

23.96 

25.24 

28.50 

30.27 

31.37 

33.28 

35.19 

24.06 

25.84 

29.17 

30.78 

31.81 

33.58 

35.44 

man 

23.04 

24.10 

26.63 

28.04 

28.98 

30.59 

32.28 

23.10 

24.32 

26.97 

28.30 

29.14 

30.73 

32.51 

matches 

20.13 

21.59 

25.08 

26.73 

27.68 

29.28 

31.03 

20.20 

22.15 

25.79 

27.19 

28.07 

29.61 

31.34 

shuttle 

25.40 

27.08 

31.88 

34.80 

36.63 

39.96 

43.28 

25.56 

28.27 

33.39 

36.06 

37.89 

40.97 

44.13 


TABLE III 

Recovery PSNRs for SET3. PSNRs are in dB. Percentage values are sampling ratios. 



Universal dictionary 

Adaptive dictionary j 

5% 

10% 

20% 

25% 

30% 

40% 

50% 

5% 

10% 

20% 

25% 

30% 

40% 

50% 

Barbara 

20.60 

21.07 

22.50 

23.53 

24.33 

25.97 

28.06 

20.97 

21.91 

23.50 

24.72 

25.81 

27.57 

29.66 

boat 

21.50 

22.01 

23.57 

24.68 

25.50 

27.20 

29.26 

21.95 

23.15 

24.91 

26.11 

27.10 

28.63 

30.48 

bridge 

19.69 

20.04 

21.19 

22.01 

22.62 

23.85 

25.31 

20.07 

20.91 

22.22 

23.06 

23.74 

24.81 

26.12 

couple 

21.61 

22.13 

23.71 

24.86 

25.72 

27.44 

29.41 

22.06 

23.33 

25.19 

26.41 

27.39 

28.84 

30.57 

fingerprint 1 

16.14 

16.69 

18.54 

20.01 

21.20 

23.66 

26.56 

16.87 

18.35 

20.95 

22.75 

24.22 

26.14 

28.41 

fingerprint 2 

16.16 

16.99 

19.35 

20.93 

22.08 

24.26 

26.71 

17.10 

18.97 

21.74 

23.57 

25.12 

27.16 

29.53 

Flintstones 

15.60 

16.07 

17.66 

18.88 

19.88 

21.96 

24.45 

15.98 

17.02 

18.93 

20.32 

21.54 

23.40 

25.58 

grass 

11.12 

11.29 

12.08 

12.76 

13.34 

14.63 

16.36 

11.30 

11.84 

13.09 

13.99 

14.80 

16.06 

17.66 

hill 

23.43 

23.96 

25.50 

26.52 

27.26 

28.73 

30.46 

23.94 

25.25 

26.94 

27.97 

28.81 

30.04 

31.55 

house 

22.29 

22.77 

24.36 

25.61 

26.61 

28.76 

31.36 

22.69 

24.02 

26.49 

28.26 

29.77 

31.76 

33.90 

Lena 

23.17 

23.83 

25.71 

27.00 

27.95 

29.89 

32.11 

23.74 

25.24 

27.26 

28.62 

29.71 

31.30 

33.20 

man 

22.28 

22.75 

24.23 

25.26 

26.04 

27.57 

29.45 

22.75 

23.97 

25.65 

26.71 

27.60 

28.88 

30.50 

matches 

19.34 

20.22 

22.63 

24.02 

24.96 

26.68 

28.62 

19.92 

21.74 

23.96 

25.23 

26.22 

27.66 

29.39 

shuttle 

24.58 

25.55 

28.27 

30.20 

31.64 

34.50 

37.99 

25.27 

27.44 

30.48 

32.53 

34.24 

36.78 

39.82 


TABLE IV 

Recovery PSNRs for SET4. PSNRs are in dB. Percentage values are sampling ratios. 



Universal dictionary 

Adaptive dictionary j 

5% 

10% 

20% 

25% 

30% 

40% 

50% 

5% 

10% 

20% 

25% 

30% 

40% 

50% 

Barbara 

21.19 

21.89 

23.78 

24.90 

25.66 

27.06 

28.40 

21.31 

21.90 

24.35 

25.81 

26.86 

28.68 

30.53 

boat 

22.38 

23.43 

25.98 

27.36 

28.25 

29.78 

31.24 

22.47 

23.40 

26.22 

27.60 

28.49 

30.09 

31.84 

bridge 

20.52 

21.23 

22.89 

23.79 

24.38 

25.40 

26.41 

20.65 

21.32 

22.99 

23.89 

24.48 

25.60 

26.92 

couple 

22.49 

23.48 

25.94 

27.32 

28.23 

29.82 

31.44 

22.56 

23.50 

26.32 

27.65 

28.50 

30.06 

31.79 

fingerprint 1 

17.15 

18.45 

22.08 

24.17 

25.52 

27.94 

30.51 

17.24 

18.74 

23.51 

25.33 

26.56 

28.67 

31.04 

fingerprint 2 

17.13 

18.40 

21.52 

23.10 

24.11 

25.85 

27.71 

17.21 

18.93 

24.74 

26.79 

28.29 

30.52 

33.15 

Flintstones 

16.58 

17.80 

21.05 

22.84 

23.98 

25.83 

27.40 

16.62 

17.77 

21.38 

23.12 

24.21 

26.04 

27.77 

grass 

11.84 

12.43 

14.08 

15.12 

15.84 

17.13 

18.35 

11.99 

12.59 

14.44 

15.53 

16.29 

17.67 

19.25 

hill 

24.25 

25.19 

27.43 

28.62 

29.39 

30.72 

32.07 

24.33 

25.20 

27.77 

28.88 

29.59 

30.94 

32.49 

house 

23.22 

24.33 

27.38 

29.11 

30.23 

32.16 

34.11 

23.33 

24.47 

28.70 

30.81 

32.19 

34.20 

36.16 

Lena 

24.06 

25.33 

28.62 

30.33 

31.39 

33.18 

34.96 

24.14 

25.40 

29.08 

30.65 

31.62 

33.32 

35.11 

man 

23.19 

24.18 

26.59 

27.85 

28.68 

30.12 

31.61 

23.28 

24.19 

26.83 

28.03 

28.80 

30.24 

31.87 

matches 

20.23 

21.54 

24.92 

26.53 

27.46 

29.04 

30.66 

20.27 

21.54 

25.59 

26.94 

27.82 

29.31 

30.94 

shuttle 

25.53 

27.02 

31.63 

34.32 

36.04 

39.03 

42.09 

25.60 

27.37 

32.87 

35.27 

36.88 

39.55 

42.48 


TABLE V 

Recovery PSNRs for SETS. PSNRs are in dB. Percentage values are sampling ratios. 
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Fig. 4. Test images. From left to right and top to bottom: ‘Barbara', ‘boat’, ‘bridge’, ‘couple’, ‘fingerprint 1’, ‘fingerprint 2’, ‘Flintstones’, 
‘grass’, ‘hill’, ‘house’, ‘Lena’, ‘man’, ‘matches’ and ‘shuttle’. 


at 1000 with other learning parameters set at their default values |16|. For example, the mini-batch size for the 
online learning algorithm is 400 patches and Ai = 0.15. 

SET2: This setting is identical to SET1, except that we do not perform pre-processing on the test images. 
Therefore, these simulations are closer to the real case where there is no guarantee test images are sparse under 
any finite dictionary. 

SET3: In this setting, for the initial dictionary, we have used a redundant dictionary that was trained over a large 
set of training images using the K-SVD method |4]. Testing with the K-SVD dictionary helps in benchmarking the 
performance of the proposed method. 

SET4: In addition to testing the proposed DL-M algorithm with the state-of-the-art learned dictionaries, in this 
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setting we test the algorithm with an ordinary orthogonal Discrete Cosine Transform (DCT). The advantage of 
using DCT as the initial dictionary is the low memory cost of storing the dictionary and faster learning due to the 
small size of the dictionary. 

SET5: Finally, we test the algorithm for the image inpainting problem. In this scenario, the measurements 
are acquired in the standard basis, i.e. a portion of pixels are measured. Pixel-wise sampling corresponds to the 
most practical acquisition approach. Flowever, 4> ; ’s would not satisfy the sub-Gaussian concentration inequalities, 
resulting in a weaker performance than was promised in this paper. 


B. Results and discussion 

The recovery PSNR results for every setting, averaged over 20 trials for each case, are presented in Tables [T| 
through fvj Since the block size is n = 64, the sampling ratios 5%, 10%, 20%, 25%, 30%, 40% and 50% respectively 
correspond to m = 3, m = 6, m = 12, m = 16, m = 19, m = 25 and m = 32 measurements per block. The 
running time on an Intel Core i5-3470 CPU with a clock speed of 3.2GFlz, using Lasso on the Python platform 


developed for [17], respectively for 5%, 20% and 50% was measured around (on average) 40, 50 and 80 seconds. 

Generally speaking, DL-M results in noticeable block-CS recovery improvements for most images. We found that 
most cases would benefit from a higher T max while some recovery results saturate or even degrade after a certain 
number of iterations between 10 and 20. This is expected since, as we showed in Section [TIIJ the image statistics 
can influence the DL-M performance and its resistance against overfitting. Not surprisingly, our best performance 
is achieved in SET1 where the image is guaranteed to allow exact sparse representation (although unknown to the 
solver). 

As the number of measurements per block is increased, the PSNR gain is more significant. This behavior is 
expected since, with more measurements, the DL-M problem becomes closer to the well-posed DL problem, i.e. 
DL based on the complete knowledge of the underlying image. Meanwhile, it is crucial that the PSNR gain stays 
positive for very low sampling ratios, e.g. for m = 3 and m = 6. As it can be seen, the proposed DL-M algorithm 
is reliable in every sampling ratio and resistive to overfitting for most test images. 

Finally, our inpainting results (SET5) can be compared to the BCS results in |7J which uses the one-block- 
sparse signal modcf^l In Q, the recovery PSNR for Barbara’s image with n = 64, p = 256 and m = 32 was 
reported at 27.93 dB which is lower than recovery PSNR using the non-adaptive universal dictionary reported in 
this paper. In addition to that, the proposed inpainting algorithm in [7|] uses overlapping block-CS recovery which 
significantly increases N, resulting in performance boost at the cost of more computational complexity. Meanwhile, 
we have employed a non-overlapping framework to stay consistent with the general block-CS recovery where the 
measurement matrices can take any form and overlapping block recovery is not always possible. 


VI. Conclusion 

The analysis and the empirical experiments presented in this paper show that blind compressed sensing of natural 
images can be performed reliably without additional constraints on the dictionary or the sparse coefficients. Our 
results generalize the BCS frameworks in [j6| and [7 ] by providing the probability of uniqueness, as well as accuracy. 
Meanwhile, we showed that dictionary learning is ill-posed when fixed block measurements are employed. The 
results of this paper can be extended to work with dense measurement matrices using ([5]). There are at least two 
main directions to improve and build upon the work reported in this paper: 

• Deterministic sensing matrices'. We studied the class of random independent Gaussian sensing matrices for 


sensor design. Meanwhile, deterministic designs of sensing matrices are proven to perform equally well [321 
while addressing the implementation constraints. 


• Online DL-M: It has been known [17| that online gradient descent is more efficient than batch gradient descent 
for large data sets and especially when the data becomes available in a streaming fashion. Therefore, extension 
to online DL-M, both algorithmically and analysis-wise, could be the next step. 

10 Unfortunately, it is not possible to compare our results with the method of |8) since its code has not been released and the simulation 
settings and evaluation metrics in |8l| are different than ours. 
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